Aid to discovery of new protein foldings
ثبت نشده
چکیده
Large-scale sequencing projects produce an exploding number of known protein sequences. The current number is about 36,000 [Bairoch & Boeckmann, 92] sequences, but before the end of the century many more than 100,000 will have to be dealt with. This is in contrast to the far slower increase in the number of known protein structures, currently about 2,000 [Berstein et al., 77]. The rate of increasing is roughly 100 sequences/day and 1.5 3D structure/day. Thus, it is increasingly important to develop computational approaches to determine automatically (predict) the structure of proteins whose sequences are known. Because the general problem of prediction of protein fold is so difficult, researchers have tried to predict regular substructures forming the imaginary level of protein structural description known as secondary structure. Knowledge of the secondary structure can contribute significantly towards the goal of tertiary fold prediction. This knowledge can constrain the possible conformations of the protein [Cohen & Knutz, 89], provide a good starting point and reduces the search space in simulation of protein folding by molecular dynamics [Levitt, 83] or lattice models [Skolnick & Kolinski, 90], or can be used in predicting higher order structures (e.g. super secondary structures [Taylor & Thornton, 83], domains [Lathrop, 87]). The established methods for protein secondary structure prediction include hand-crafted expert rules [Lim, 74], biological predictive patterns [Cohen et al., 83, 86] [Presnell et al., 92], statistical Chou-Fasman theory [Chou & Fasman, 74], information theory-based GOR method [Garnier et al., 78]. More recent methods often make use of inductive learning techniques, whereby a system is trained with a set of sample proteins of known conformation and then uses what it has learned to predict the structure of previously unseen proteins. Both neural networks [Qian & Sejnowski, 88] [Kneller, 90] [Zhang et al., 92] and symbolic induction have been applied [King & Sternberg, 90] [Muggletonet al., 92] in the secondary structure prediction context. Despite the apparent practical importance of the secondary structure concept, the quarter of century long research efforts have shown the existence of a secondary structure prediction limit. Even if this limit has been recently ameliorated up to 70% [Cost & Salzberg, 93] [Leng & Bachanan, 93] [Rost & Sander, 93], this rate of accuracy is too low to be of practical use in constraining the conformation space for tertiary structure prediction. The main reason of the failure of the secondary structure prediction methods is that the formation of structure (including the secondary one) is only to a certain degree due to sequentially local interactions of amino acids. However, most methods known to date do rely on local information. It becomes widely recognized that to deal properly with protein structure prediction problem one should tackle differently the representation issues. Several attempts have been made to change the representations involved into the secondary structure prediction problem. The protein sequence usually presented by amino acids has only been examined in terms of physico-chemical properties of amino acids [Hunter, 91] [Cherkauer & Shavlik, 93]. The secondary structure elements have been replaced by alternative classes of local structure, which account for recognized helical and strand regions, as well as for novel categories such as Nand C-caps of helices and strands [Zhang et al., 93]. However, these first works addressing representation issues do not improve significantly the state-of-the-art. Obviously, much more representation work is needed. Biological knowledge representation is hard for several reasons: the tricky nature and ill-formalized character of the available biological knowledge the lack of general theoretical understanding in the field the fact that this knowledge comes in a raw form
منابع مشابه
Designing an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملProteomics Applications in Health: Biomarker and Drug Discovery and Food Industry
Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...
متن کاملProteomics Applications in Health: Biomarker and Drug Discovery and Food Industry
Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...
متن کاملFew Optimal Foldings of HP Protein Chains on Various Lattices ∗
We consider whether or not protein chains in the HP model have unique or few optimal foldings. We solve the conjecture proposed by Aichholzer et al. that the open chain L2k−1 = (HP )(PH) for k ≥ 3 has exactly two optimal foldings on the square lattice. We show that some closed and open chains have unique optimal foldings on the hexagonal and triangular lattices, respectively.
متن کاملA New Discovery about Inflow Control Devices in Controlling Water and Increasing Oil Recovery
Inflow control devices (ICD), which prevent water breakthrough by controlling the inflow profile of a well, have been used successfully in many oilfields. This paper will introduce a new discovery and an unsuccessful example. Moreover, this paper investigates meticulously and thoroughly to find the application conditions of the new discovery. Based on permeability rush coefficient and permeabil...
متن کاملSearch for the Pharmacophore of Histone Deacetylase Inhibitors Using Pharmacophore Query and Docking Study
Histone deacetylase inhibitors have gained a great deal of attention recently for the treatment of cancers and inflammatory diseases. So design of new inhibitors is of great importance in pharmaceutical industries and labs. Creating pharmacophor models in order to design new molecules or search a library for finding lead compounds is of great interest. This approach reduces the overall cost ass...
متن کامل